{
 "cells": [
  {
   "cell_type": "markdown",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Visualizing data with matplotlib"
   ]
  },
  {
   "cell_type": "markdown",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "Somtimes graphs provide the best way to visualize data\n",
    "\n",
    "The **matplotlib** library allows you to draw graphs to help with visualization\n",
    "\n",
    "If we want to visualize data, we will need to load some data into a DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load our data from the csv file\n",
    "delays_df = pd.read_csv('Data/Lots_of_flight_data.csv') "
   ]
  },
  {
   "cell_type": "markdown",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "In order to display plots we need to import the **matplotlib** library"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A common plot used in data science is the scatter plot for checking the relationship between two columns\n",
    "If you see dots scattered everywhere,  there is no correlation between the two columns\n",
    "If you see somethign resembling a line, there is a correlation between the two columns\n",
    "\n",
    "You can use the plot method of the DataFrame to draw the scatter plot\n",
    "* kind - the type of graph to draw\n",
    "* x - value to plot as x\n",
    "* y - value to plot as y\n",
    "* color - color to use for the graph points\n",
    "* alpha - opacity - useful to show density of points in a scatter plot\n",
    "* title - title of the graph"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "#Check if there is a relationship between the distance of a flight and how late the flight arrives\n",
    "delays_df.plot(\n",
    "               kind='scatter',\n",
    "               x='DISTANCE',\n",
    "               y='ARR_DELAY',\n",
    "               color='blue',\n",
    "               alpha=0.3,\n",
    "               title='Correlation of arrival and distance'\n",
    "              )\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Check if there is a relationship between the how late the flight leaves and how late the flight arrives\n",
    "delays_df.plot(\n",
    "               kind='scatter',\n",
    "               x='DEP_DELAY',\n",
    "               y='ARR_DELAY',\n",
    "               color='blue',\n",
    "               alpha=0.3,\n",
    "               title='Correlation of arrival and departure delay'\n",
    "              )\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "The scatter plot allows us to see there is no correlation between distance and arrival delay but there is a strong correlation between departure delay and arrival delay.\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}